CACHING ON THE INTERNET
By Lisa Sanger
Spring 1996
--------------------------------------------------------------------------------
INTRODUCTION
A newly joined Prodigy member logs on to the service for the
first time and it takes several minutes for her computer to download the fancy
homepage graphics. The next day she logs on again and this time the graphics
pop up almost immediately. She wonders: Why did her computer perform the same
download faster the second day?
While browsing the Internet, a Web surfer notices that his
computer retrieves documents from the popular Electronic Frontier Foundation
website much faster than it retrieves documents from his favorite, if rather
eccentric, New Trends in Underwater Basket Weaving website. He wonders: Why
does it take longer to download a less popular and less congested website?
HowTired, a popular online magazine, notes a sudden and
inexplicable drop in "hit" rates (i.e. popularity statistics).
HowTired wonders: Why have we lost these hits?
The answer to these three questions is: caching. Among other
things, caching can speed up downloading on one's personal computer; caching
can speed up connection time to popular sites on the Internet; and caching can
mask websites' hit rates. Any person active on the Internet has probably both
cached documents herself and received cached data from elsewhere on the
Internet. Caching is widespread on the Internet, but it raises thorny issues
under U.S. Copyright law.
In this paper I examine caching, the implications of U.S.
Copyright law and potential resolutions. Part I defines the term caching. Part
II discusses the benefits and drawbacks of caching on the Internet. Part III
specifically explores whether caching constitutes copyright infringement. And,
finally, Part IV considers whether proxy caching should continue.
I. CACHING: DEFINING THE TERM
Caching is a generic term meaning "to store."
Whether a person caches web pages to avert Internet traffic, or a squirrel
caches nuts for the winter, the concept is the same. When applied to the
Internet, "caching" means "the copying of a web page, made incidental
to the first access to the page, and storage of that copy for the purpose of
speeding subsequent access."
There are two ways to cache web pages on the Internet:
"client caching" and "proxy caching." Important functional
differences exist between client caching and proxy caching and these
differences may in some instances influence the legal analysis. Unfortunately,
people often forget to specify what type of caching they are referring to
leaving the entire discussion of caching quite muddled. In an effort to clarify
the muddle, I will review the definitions of client caching and proxy caching.
A. Client Caching
Client caches reside within an individual user's Web browser
software (such as Netscape or Mosaic). The client cache stores not only the
documents currently displayed in browser screens, but also documents requested
in the past. Client caching takes two forms: persistent and non-persistent. A
persistent client cache retains its documents between invocations of the Web
browser. Netscape uses a persistent cache. A non-persistent client cache (used
in Mosaic) removes any memory or disk space used for caching when the user
quits the browser.
The client caching process works similarly whether it
maintains persistent or non-persistent cache. When the user's computer requests
a website, the computer will first check to see if the data requested already
resides in the cache. If the cache has a copy of the requested data, then the
cache provides the data very quickly to the user. If the data is not in the
cache, the computer fetches the item needed from the Internet, and also stores
a copy in the cache. Now the cache has this data available if the processor
requests it again. The larger the cache, the more data the cache can store, and
the more likely the cache will have the requested item.
B. Proxy Caching
The second form of caching, "proxy caching," takes
place on a network used by the World Wide Web ("WWW" or
"Web"). Proxy caches reside on machines in strategic places
(typically gateways) in the network of the WWW. Both non-profit local access
networks, and large for-profit service providers such as Prodigy, run proxy
caches. Unlike client caching, which services only one client, proxy caches
service many clients. Thus, proxy caching helps relieve the intense congestion
now plaguing the Internet on a much grander scale than client caching.
Proxy servers act as intermediaries between local clients
and remote content servers. Initially, scientists developed proxy servers to
function as firewalls for security reasons. Today proxy servers may function as
caching mechanisms as well as firewalls. A proxy server is a machine (or
collection of machines) through which all traffic must pass.
When a user asks a client for a certain web page, the client
heads out to the Internet. If there is a caching proxy, client requests go to
the proxy server, not to the remote web page. The proxy checks to see if it has
already cached the requested page on the proxy server. If the server has a
cached copy of the web page, the server returns the page to the client directly
(see Figure 1 below). Reporting cached information to clients occurs rapidly
because it requires reduced Internet activity. In addition to helping
individual clients and networks with proxy servers, caching also helps the
remote web page server. Caching reduces the computational load on the remote
content server and makes it possible for that server to supply data to more
machines exponentially. If the server does not have a cached copy of the
requested document, the server goes out to the remote web page server, finds
the original, and passes the data back to the client at the same time keeping a
copy on its cache (see Figure 2 below). The "cache proxy" or
"proxy cache" server is, at one and the same time, a
"proxy" (it accepts requests from clients, and carries them out on
the clients' behalf) and a "cache" (it keeps a copy of the documents
that it retrieves, and fulfills subsequent requests from that copy where
appropriate).
Figure 1. (omitted in HTML version)
Figure 2. (omitted in HTML version)
C. Distinguishing Caching from Archiving
In every day practice, the line between the caching process
and the archiving process can become gray. However, technically speaking, the
two processes have distinct differences.
Caching entails a more automated process than archiving. A
cache simply copies data which passes through it on the way to the client. The
cache retains the copy in case the client, or another client in the same
network, needs the document again. After a certain amount of time, the cache
needs more space, so it clears the least recently used, or least used, page to
make room for another requested document. The caching process -- theoretically
-- maintains constant updating throughout the day. Caching's automated mass
copying strives primarily to speed access to popular sites on the Internet.
Archiving involves a more manual process where an individual
affirmatively goes out and copies an entire other website or section of a
website (not just separate pages requested by clients) onto her local server.
Archiving stores a server's input while caching stores the server output to
clients. Also, archiving updates much less frequently than caching. Archiving
aims to compile a library or historical resource of sorts, it does not aim to
substitute copies in order to preserve bandwidth. In fact, mass archiving can
consume a great deal of resources.
The line between caching and archiving or replicating
becomes gray when a network or service provider configures its caching proxy
server to permanently hold the most frequently asked for sites and keep them
current. In this scenario the proxy server just permanently caches these pages
and updates them either at specified intervals or whenever the TTL (Time to
Live) encoded in the page expires. Although the line between caching and
archiving or replicating blurs at certain points, it is important for the
reader to note that caching does refer to a distinct process.
II. CACHING: BENEFITS AND DRAWBACKS
A. THE PROBLEM
Each WWW address specifies or implies a reference to one
particular site on the Internet. This means that without some kind of
additional machinery, whenever a person requests a specific WWW address, no
matter where she is from and no matter how often others in her network request
the same address, she will make a network call to that specific site, leading
to unnecessarily high use of network links and excessive load on the servers
for popular sites.
B. CACHING BENEFITS
High use of network lines and excessive load on popular
servers leads to one of the single biggest problems experienced by Internet
users today: lack of adequate bandwidth. Information abounds on the Internet,
but the delay involved in retrieving that information frustrates many users.
Until the Internet infrastructure upgrades to bigger "pipes" which
can transmit greater amounts information in the same amount of time, Web
surfers must look to other means to relieve the congestion. Caching helps to
relieve Internet congestion in five ways: 1) caching expedites user access
time; 2) caching decreases the amount of bandwidth each user uses; 3) caching
decreases bandwidth used on the Internet generally; 4) caching decreases
bandwidth used on network servers; and 5) caching decreases bandwidth used on
remote servers. Caching creates social benefits (by reducing traffic on the
Internet generally) and it creates private benefits (by saving time and
conserving bandwidth for individual users and servers). Thus, caching improves
the Internet for everyone.
C. CACHING DRAWBACKS
Caching has three main drawbacks: 1) caching inhibits
websites' ability to calculate hits and page impressions; 2) caching may result
in promulgation of stale documents; and 3) caching may constitute copyright
infringement.
1. Hits and Page Impressions. A "hit" is a form of
measurement on the WWW. Websites measure one "hit" for each time a
client (or proxy server) requests a file from their server. Note that the
number of files on a web page vary greatly from page to page. The use of
graphics adds many files to a single page. Thus, if a user requests a page
without any graphics, the website may measure only a singe hit, but if a user
requests a page with many graphics, the website may measure 30 or more hits.
Instead of hits, Websites may also count "page impressions" as
measurements of usage. Websites using this system measure one "page
impression" for each time a user or proxy server requests a page from a
website's server. Page impressions avoid the file discrepancy problem that
occurs with hits.
The problem with caching and hits or page impressions is
that when a client or proxy server caches a page, the website logs only one
access. The remote content server will never know that ten, a hundred or a
thousand users have accessed a particularly popular document based on a single
access by a proxy cache! Hits and page impressions offer valuable information
to websites -- they help websites gauge usage and understand their users'
preferences, and more importantly, websites use hit and page impression
statistics to sell advertising space. Websites that log more hits and/or page
impressions can sell more valuable advertising space. Many websites provide
information to users free of charge and rely on selling advertising space (such
as banners) in order to generate revenue. Since advertising generates their
revenue, websites prize their hit and/or page impression data and dislike the
fact that caching deprives them of this information.
While websites often prize their hit and/or page impression
data, it is important to note the flaws of these measurements. I have already
noted one flaw: hits do not distinguish between a one file page and a 30 file
page and thus graphic laden pages can greatly skew statistics if people rely on
hits as a metric. Secondly, neither hits nor page impressions can tell how long
a user viewed a page. Both methods register equal values whether a used viewed
a page for an hour, or simply clicked through in a split second in order to
reach a different destination. Thirdly, neither method can distinguish between
users. A website measures the same 100 hits or page impressions whether made
from one crazed fan or 100 separate fans. Advertisers obviously would prefer to
reach 100 separate fans. Due to these flaws, web-savvy people know to account
for the dubious figures of hit and page impression statistics. These usage
statistic techniques are simply the best technology currently offers (without
requiring users to log in and identify themselves or invest in additional tools
to glean insights, i.e. IP address analysis). Advertisers have learned to
adjust to these flaws of the hit and page impression calculations. Advertisers
will likely also find ways to adjust to caching's twist on hit and page
impression statistics.
2. Stale Documents. Caches supposedly update regularly and
frequently. However, no fixed schedule exists whereby caching servers guarantee
to keep their cached copies current. Vigilant cache servers may update every 30
minutes. Delinquent cache servers may not update for days or weeks.
When users request information from a remote website, they
may in fact receive that information from a cache. If the cache information is
stale (i.e. the remote website has changed its content since it was cached) the
user has received, at best, outdated information and, at worst, harmful and
misleading information. The degree of the threat of stale information depends
on the nature of the website's content. If a user requests today's Dilbert
cartoon, but receives yesterday's cartoon because the cache has not updated
yet, the user suffers little harm beyond annoyance. But what happens if a user
invests her money based on a cached page of the NYSE ticker page? Should she
bet her money on old (even if it is only 30 minutes old) information?
Additionally, what happens if a website posts information (e.g. liability
inducing speech such as defamatory or obscene statements) which it pulls down a
few hours later, but a server has already cached the information? Now the
liability inducing information does not exist on the website, but it lives on
in a cache. If a lawsuit over the information should emerge, who should a court
hold liable? Caching causes websites to lose control of their content.
Proliferation of outdated information on a medium known for its immediacy
constitutes a large drawback for the process of caching.
3. Copyright Infringement. Caching web pages involves
copying other peoples' intellectual property and may constitute copyright
infringement. In order to determine whether caching constitutes a copyright
infringement, one must closely examine whether caching meets the criteria for
infringement and then whether cachers are eligible for the affirmative defense
of fair use. I have dedicated section III of this paper to my copyright
analysis.
III. IS CACHING COPYRIGHT INFRINGEMENT?
A. WHAT IS A COPYRIGHT?
A copyright is a right of intellectual property. Copyright
grants authors, for a limited time, certain exclusive rights to their works.
Copyright is exclusively federal law, and derives from the "copyright
clause" of the Constitution which provides that, "The congress shall
have the Power To promote the progress of Science and useful Arts, by securing
for limited Times to Authors and Inventors the exclusive Right to their
respective Writings and Discoveries."
To be eligible for copyright protection, a work must meet
two conditions: 1) it must be an original work of authorship, and 2) be fixed
in a tangible medium of expression.
The U.S. Copyright Act grants a copyright owner the
exclusive right to do and to authorize any of the following: (1) to reproduce
the copyrighted work in copies; (2) to prepare derivative works based upon the
copyrighted work; (3) to distribute copies to the public; (4) to perform the
copyrighted material publicly; (5) to display the copyrighted work publicly;
and (6) to digitally perform the work.
Caching can encroach on most of the copyright holders' six
exclusive rights. First, both proxy caching and client caching implicate the
copyright holders' reproduction rights because they both "reproduce"
a copy into their caches. Second, proxy caching implicates the public display,
public performance, and digital performance rights. To perform or display a
work "publicly" means to transmit a performance or display of the
work to the public (i.e. to a substantial number of people outside of a normal
circle of a family and its social acquaintances) , by means of any device or
process, whether the members of the public receive the performance in the same
place or in separate places and at the same time or at different times. Proxy
caching makes its cached copy available to all those who use the proxy, which
clearly places proxy caching within the definition of public display, public
performance, and digital performance (the nature of the work, e.g. music or
literature or computer program, determines which of these three rights are
implicated). Third, proxy caching encroaches on copyright holders' distribution
rights. The United States' Task Force on Intellectual Property states that
making copies of a copyrighted work widely available online constitutes
infringement of the copyright holder's distribution rights. Proxy caches
regularly make copyrighted works widely available online to all their clients.
B. WHAT CONSTITUTES COPYRIGHT INFRINGEMENT?
To sustain a claim for copyright infringement, a plaintiff
must show a) ownership of the copyright and b) copying of the protected
material. A plaintiff may demonstrate ownership of her copyright by simply
registering her work. One can register before or after the infringement has
occurred, so a plaintiff can meet this requirement easily. In the case of
caching, a plaintiff can also meet the second requirement, a showing of
"copying," easily because caching, by definition, is copying.
Based on the ease of showing both 1) ownership of one's copyright
and b) copying of the protected material in the caching context, caching
certainly constitutes copyright infringement. However, the copyright analysis
of caching does not end here. Even when a plaintiff clearly establishes
copyright infringement, a defendant may assert the affirmative defense of
"fair use."
C. FAIR USE: AN AFFIRMATIVE DEFENSE TO COPYRIGHT
INFRINGEMENT
Recall that the U.S. Constitution states that copyright law
exists in order to promote science and the useful arts. The "fair
use" doctrine allows courts to avoid rigid application of the copyright
statute when a finding of infringement would actually serve to inhibit the very
artistic and scientific activity copyright law strives to foster. The fair use
statute states: "Fair use of a copyrighted work, including such use by
reproduction in copies ... or by any other means specified by that section, for
purposes such as criticism, comment, news reporting, teaching (including
multiple copies for classroom use), scholarship, or research, is not an
infringement of copyright." The statute declares that a court should
consider the following four factors when determining whether a use of a
copyrighted work constitutes a "fair use": (1) the purpose and
character of the use; (2) the nature of the copyrighted work; (3) the amount
and substantiality of the portion used in relation to the copyrighted work as a
whole; and (4) the effect of the use upon the potential market for or value of
the copyrighted work. Fair use entails a multi-factor analysis where no factor
dominates. Below I break down each of the four factors of fair use, relate each
factor to the web caching context, and then specifically apply the factor to a
hypothetical situation in order to supply at least one "real life"
example of the caching copyright conflict. The hypothetical situation involves
two characters: 1) USOL, a popular Internet service provider with millions of
members, and 2) "HowTired," a hip online magazine which registers
upwards of 500,000 hits per day. Because HowTired is so popular with USOL's
users, USOL caches HowTired's entire site on a regular basis.
1. Purpose and Character of the Use. In considering the
purpose and character of the use, courts have looked at whether the use is for
commercial or for non-profit or educational purposes. If the defendant uses the
copyrighted work commercially, the use is less likely to be fair use. If the
defendant uses the same work in a non-profit school to teach, the use is more
likely to be fair use. Some courts might forgive commercial use if the use is a
"productive" or "transformative" use. Transformative uses
are uses that add value to the material taken from the copyrighted work. The
Supreme Court has noted that this distinction between transformative and
nontransformative uses is not wholly determinative, but can be considered when
a court is balancing interests.
The distinction between client caching and proxy caching
commands different analyses regarding the purpose and character factor. Client
caches, which cache only to one client, usually do not serve any commercial
purpose. On the other hand, proxy caches may cache precisely for commercial
purposes. Many networks which run proxy caches charge for their services.
Quicker access to more popular pages can warrant higher service fees. Consumers
value speedy access. Moreover, caching saves network resources. It is more
efficient to provide multiple copies direct from a proxy server than to
repeatedly traverse the Internet to provide single copies from remote servers.
In fact, one may analogize proxy caching to the Texaco case where a corporation
made photocopies of professional journal articles for its researchers' personal
files. In Texaco, the court found the defendant guilty of copyright
infringement for photocopying costly trade journals for staff members who
wished to conveniently keep copies in their offices rather than trekking to the
company library. The infringing photocopies benefited the company in two ways:
1) the company saved money by not paying for extra copies of costly trade
journals; and 2) the company's employees got to work more efficiently because
of the local copies. Analogously, caching benefits service providers by: 1)
saving money because they can avoid buying more computer and communications
equipment; and 2) giving the users better performance.
One may argue that caching has a transformative use. Caching
adds value to the copyrighted material taken because faster Internet service
produces a value that benefits the broader public interest. The more
user-friendly the Internet, the more people will use it. Thus, caching
encourages the growth of this promising young communications network. This
argument may carry some weight, but it must be balanced with the
commercial/non-commercial use analysis. Transformative use alone does not
dictate a finding of fair use.
Extrapolating to the hypothetical situation where USOL caches
HowTired, the purpose and character factor would indicate that the use is not
fair use. USOL is a commercial operation. Its cached copy of the HowTired
website enhances its commercial services. Through caching, USOL preserves its
own resources (caching requires less telecommunications hardware, thereby
reducing costs) and improves its Internet service by accelerating user access
time. USOL's cached copies do not serve any direct educational purpose. One
might argue that while USOL's caching does not serve an educational purpose, it
does have a transformative use. The argument follows: easing access to content
on the WWW furthers copyright law's goal of promoting science and the arts,
thus, even if USOL's cache runs for profit, the immediate effect of caching
serves the public benefit. While this argument has some merit, I believe it is
out of place in the context of the "purpose and character" factor.
Caching speeds access to the Internet, but USOL could also speed access by
buying more hardware and faster modem connections. USOL chooses to cache
because it better serves its commercial purposes.
2. Nature of Copyrighted Work. The second factor the courts
use in evaluating fair use is whether the nature of the copyrighted work is
factual or fictional. Courts consider copies of factual works more easily
subject to fair use copying based on the principle that copyright law protects
expression, not facts. Owners of fictional or artistic works, which are more
expressive than factual works, have a stronger claim to their copyrights. Thus,
one who copies a fictional or artistic work will have a much more difficult
time claiming fair use.
Web page content varies tremendously on the Internet. Some
pages simply catalog contact information such as phone numbers and addresses
(e.g. www.Four11.com or www.switchboard.com). According to Feist case, phone
numbers constitute unprotectable facts. Some web pages display original
literature and/or digital works of art and would warrant higher copyright
protection. Obviously, in the context of web page caching, the actual content
of the web page copied will influence whether the person copying may
successfully claim the fair use defense.
When evaluating this second factor of fair use, courts also
examine whether the work has been published. Courts usually consider use of an
unpublished work as more likely to infringe than an analogous use of a
published work. Courts grant more protection to unpublished works because: a)
they have not yet taken advantage of their valuable right to first publication;
and b) unauthorized use affects the copyright holder's ability to chose not to
publish the work at all.
Although posting one's work on the WWW differs slightly from traditional publishing, a court is likely to find that all works posted on the WWW are "published" for the purposes of copyright law. The WWW is a very public forum, it has a world-wide audience whose numbers far exceed most audiences for traditionally published works. A copyright owner who posts her work on the web has already made use of her right to first publication by making the work so widely available and has already obviously chosen to make the work public. Since all works on the WWW qualify as published and therefore less needy of protection, this element would cut in favor of a court finding caching a fair use.
Additionally, some have tried to argue that by
posting/publishing one's work on the Internet, one has granted the public an
"implied license" to cache the copyrighted work. According to the
law, a copyright holder cannot transfer ownership of any of her rights on an
exclusive basis without a written agreement. However, a court may imply a grant
of a nonexclusive license from a copyright holder's conduct. A copyright holder
must exhibit conduct which leads another to reasonably believe that the
copyright holder issued an authorization to copy or a waiver of her rights.
Those who argue that an "implied license" to cache exists assert that
because copying and caching pervade the Internet, simply posting one's material
in such an environment constitutes "conduct" indicative of an intent
to license the public to cache one's material. In fact, a court is unlikely to
find that simply posting one's work on the Internet is conduct enough to imply
the broad license to cache by any member of the public. Posting on the Internet
may constitute an implied license to copy in some situations. For instance,
since it is necessary for a user to copy a web page into RAM in order to view a
web page, and since a user may reasonably to assume that one who posts a web
page wishes others to view it, a court may find that web page owners have
granted users an implied license to copy works into RAM. However, a person may
not reasonably assume that simply by posting one's work on the Internet, one
wishes to subject her work to a cache which may deprive it of hit statistics
and/or forward it to others in a stale and inaccurate form.
Although works on the WWW qualify as published and therefore
less needy of copyright protection, this factor is unlikely to be dispositive
in a fair use analysis of caching. I doubt a court would ever declare a copy of
a copyrighted work fair use just because the copyrighted work was published.
And, without conduct beyond simply posting one's material on the Internet, a
court should not find an implied license to cache. At most, publishing would
weigh in as one of many elements which would tip the scales in favor of finding
a fair use.
The best argument cachers can assert under factor two in
favor of fair use may be found in the Netcom case. In Netcom, the court found
the nature of the works copied irrelevant because Netcom made copies merely to
facilitate their posting. The content of the material copied made no difference
to Netcom. Similar to Netcom, cache servers typically make copies merely to
facilitate access to the information. The content of the material copied makes
no difference to cachers. Thus, cachers can strongly argue that the precise
nature of the works copied is not important to the fair use determination.
In the USOL/HowTired situation, USOL caches HowTired solely
to facilitate user access to the popular WWW site. It would not matter to USOL
if HowTired contained pure factual reports or creative fiction. Thus, a court
should analogize the USOL caching situation to the Netcom situation, and
proclaim this factor unimportant to the fair use determination.
3. Amount and Substantiality of the Copying. As a third
factor in the fair use analysis, courts consider the amount and substantiality
of the portion of the copyrighted work used in relation to the copyrighted work
as a whole. In some cases, a court may consider a use which incorporates 90% of
another copyrighted work less likely to be a fair use than a use which only
incorporates 10% of another copyrighted work. However, this third factor
entails a more complex analysis than just calculating percentages. A court may
deny a finding of fair use even in the instance where one copies only 10% if
that 10% is "qualitatively substantial." A use is qualitatively
substantial if the portion copied goes to the heart of the work. Moreover,
while a court can find use of a minor percentage "substantial," a
court may also find use of a major percentage insubstantial and eligible for
fair use. In the Sony case (in which the Supreme Court found that off-air
non-archival videotaping of broadcast television was a fair use), the court
found fair use despite an exact and total copy of the original. In the Sega
case, the court also found fair use despite an exact and total copy of the
original because the defendant needed to make the copy in order to study the
original (the original work here was a computer program).
In order to evaluate whether a cached web page constitutes a
substantial copy of the copyrighted work, the court must first define what
constitutes the work in question. A client or proxy cache may only copy one
page or one section of a website. Is caching one web page or section like
taking a page or chapter out of a book (which would constitute a small portion
of the work in question) or like taking an entire discrete article out of a
magazine. Usually, a cache server only copies material a user requests. In most
cases, I suspect that users request at least one discrete section of a website,
not single pages out of their context. Thus, caching seems more analogous to
copying an entire article out of a magazine. Copying an entire discrete section
rather than a page obviously constitutes a more substantial copy and renders
the use less likely to be fair use.
Some might argue that the fair use videotaping in Sony for
the purpose of time shifting is analogous to caching material because caching
serves time shifting purposes too (people cache material so they can view it at
a later time if they access the site again). This argument may be persuasive in
the client caching situation, especially in cases where an individual uses
software specifically for the purpose of time shifting. However, the analogy to
Sony does not work in the proxy cache situation. Proxy servers do not cache to
shift times; they cache to make the material available for others to use.
In the USOL/HowTired hypothetical, USOL caches the entire
HowTired site. So, USOL has little room to argue that its cached copies amount
to an insubstantial amount of the copyrighted work. Further, because USOL's
cache in this situation is a proxy cache (although USOL also offers its users a
client cache system in its Internet browser), USOL cannot analogize to Sony.
USOL caches HowTired to provide quicker access to the HowTired site to USOL
users. USOL does not cache in order to time shift. USOL may try to analogize to
Sega as the court did in Netcom. In Netcom, the court excused a 100% copy
because, like in Sega, " ... Netcom had no practical alternative way to
carry out its socially useful purpose; a Usenet server must copy all files
..." This analogy will be difficult for USOL to carry off because USOL
does not need to cache in order to make the information available to its users.
Investing in more telecommunications equipment to speed user access time would
be a practical alternative to caching for USOL.
4. Effect of the Copying on the Market. The U.S. Supreme
Court has stated that the fourth factor is the "single most
important" in evaluating fair use. The fourth factor dictates that if the
use affects the market for the copyrighted work, the court should be less
likely to hold the use as a fair use.
It is difficult to identify the market in the case of
caching because, in most cases, one does not pay to access a person's website.
What market exists for material the owner distributes for free? Some argue that
if a website makes its material available for free, there is no market for the
material which copying can encroach upon. According to this argument, caching,
by making websites more accessible, enhances the copyright holder's market.
This argument rings false. While the direct consumers receive the material for
free, website material can create an undeniable market for advertising dollars.
The more desirable a website's material, the more dollars it can command for
advertising space on its site. This advertising market model should sound familiar
to any person who listens to the radio or watches television. Like many
websites, many radio and television stations provide content to their audience
for free and earn revenue by selling advertisers time to market their
product(s) to the audience.
Caching hinders the market for advertising space on web
pages in two ways. First, caching obscures hit and page impression statistics.
Many websites form advertising contracts by guaranteeing or selling a certain
number of hits or page impressions (i.e. the website promises to post an
advertisement until the website logs the contracted number of hits or page
impressions). Caching invalidates the statistics and thereby greatly affects
the advertising market created by the copyright owner's work. Second, caching
interferes with the copyright owner's control over her product. The copyright
owner sells advertising space generated by her work. She may sell advertising
space on a very tightly run schedule (i.e. one hour on the top page, 3 hours on
the sports page, etc.). If a server caches a copy at 12:00, all the users of
that cache (one for a client cache or up to thousands for a proxy cache) would
see the advertisement posted at 12:00 even if they accessed the page at 12:30
or 1:00. Copyright owners could no longer effectively sell advertisers a
certain space at a certain time.
In the USOL/HowTired hypothetical, USOL's proxy cache masked
many thousands of hits for HowTired. HowTired provides its content for free;
advertising is its sole source of revenue. HowTired guarantees its advertisers
both a certain amount of display time online and a certain amount of hits.
USOL's caching impedes HowTired's ability to extract value when it makes such
contracts with advertisers. As a result, HowTired may sell less advertising and
lose revenue. By impeding the advertising market, caching discourages HowTired
and other sites like them from operating. Caching damages the market for their
method of business. In this way, caching runs counter to the Copyright Act's
goal of encouraging creativity.
D. FAIR USE CONCLUSION
The multi-factor fair use analysis allows courts to reach
very subjective conclusions. In the context of caching, courts can interpret
the "purpose and character" factor as commercial or transformative.
Analysis of the "nature of the copyrighted work" factor varies
greatly depending on the content of the underlying web page and the cacher's
purpose. The "amount and substantiality" factor requires complex
quantitative as well as qualitative analysis. The final "effect upon the
market" factor depends on how one chooses to define the relevant market.
Therefore, no sweeping generalizations can be meaningfully made about whether
caching will constitute fair use.
I have carried the USOL/HowTired hypothetical through the
four major factors of fair use. I conclude that a court would likely hold that
USOL's cache does not qualify as a fair use because: a) USOL is a profit
seeking company and caching serves its commercial ends, b) USOL caches all of
HowTired's entire site, and c) USOL's caching impairs HowTired's advertising
market, its sole source of revenue. In many other situations, courts may find
caching to qualify as a fair use, but in at least some situations (such as in
the HowTired/USOL situation) a court may very well find that caching does not
qualify as a fair use. The very possibility that one's caching will subject one
to liability for "unfair" copyright infringement may be enough to
chill caching on the Internet.
IV. WILL AND/OR SHOULD PROXY CACHING SURVIVE?
Various factors exist which suggest that caching will or
should survive on the Internet despite the chilling reality that a basic
application of the fair use doctrine threatens to find some instances caching
ineligible for the affirmative defense. One factor which may save caching is a
pre-infringement technological solution to caching's problems. The other two
factors which may save caching involve a more in depth analysis and balancing
of the goals, history and policy behind copyright law and the economics of
caching within copyright law.
A. PRE-INFRINGEMENT TECHNOLOGICAL SOLUTIONS
The general sentiment on the WWW seems to favor caching.
Internet surfers prize expedited access and the free flow of information. The
thought of copyright law putting a damper on caching upsets these users.
Technological solutions may help these users avoid clashes with copyright law
by resolving the problems caching creates pre-infringement.
Several technological solutions exist for web content
providers to completely block caching of their site. Methods such as
instituting password protection or executable language protect websites from
caching. Password protection requires each individual user to type in her
identification information. A proxy cache server cannot perform this for each
user and therefore cannot gain access into and cache a password protected site.
A proxy cache similarly cannot cache executable script. While both these
methods circumvent caching, they may not be ideal options for the web content
provider. Web surfers generally do not like password protected sites; they like
to jump freely from one site to another. Password protection -- simply pausing
to remember and type in one's identification information -- deters a great deal
of potential viewers from ever entering one's site. Executable script does not
deter web surfers from entering one's site, but converting one's documents into
executable script instead of the traditional HTML requires a great deal more
expertise and time. Most significantly, by blocking all caching all the time,
methods such as password protection and executable script defeat all the
benefits of caching instead of just addressing caching's problems. It is not in
most sites best interests to defeat caching altogether. Caching, by increasing
bandwidth and access time, has a beneficial effect on the Internet and in turn
each individual website benefits.
As an alternative to blocking caching proxies altogether,
technology has begun to develop alternative mechanisms which incorporate the
benefits of caching while addressing one of the caching's most important flaws.
These alternative mechanisms strive to ensure that proxy caches do not issue
stale documents by communicating how long content servers wish their data to be
distributed through proxy caches. Examples of these mechanisms are:
1) Content server sends expiry date/time, proxy cache does
not retrieve a new copy until document expires. Documents can be sent
pre-expired if they ought not be cached;
2) When client requests copy of document that proxy has in
cache, proxy asks content server for headers including last change date/time,
and only fetches fresh copy of document contents if necessary;
3) Proxy sends a "conditional request" to content
server, "send me this document if it changed since date/time," server
sends standard response if document has not changed.
Document expiry mechanisms such as these allow proxy caches
and content servers to work together to reduce network overhead. Content
providers can prevent the stale document problem by specifying early expiration
dates for entries that they expect to change frequently, but encourage
efficiency by specifying late expiration dates for entries that they expect to
remain unchanged.
Document expiry headers mechanisms probably represent the
practical solution for caching for the future. But, these mechanisms cannot
resolve caching problems today. In order for expiration dates and headers to
work, 1) the information must be encoded in a standard, and 2) the proxy server
must honor that standard. Today the Internet has no established standard, just
some proposed standards. Since caching is relatively new and there are no
standards, the problem lies in that many cache servers do not respect the
expiration encoded, the administrator did not put on a realistic expire, or the
document does not have an expire in it. Additionally, many of the documents in
the Web are "living" documents and specifying an expiry date for them
is generally a difficult task. A document may remain unchanged for a long time
and then suddenly change. The author of the document may not have foreseen this
change and adjusted her expiry information accurately. It will take some time
and a significant amount of work to establish an agreed upon document expiry
system and then to work the kinks out of that system.
B. POLICY CONSIDERATIONS FOR COPYRIGHT LAW AND CACHING
POST-INFRINGEMENT: A HISTORICAL PERSPECTIVE
Before the days of photocopy machines and printing presses,
authors had little need for copyright law protection because no one had easy
mechanical means to copy another's work. As technology developed, copying
became easier and authors began to need protection in order to glean the fruits
from their labor. Copyright law emerged to protect authors' interests so that
they would not cease to create. Copyright law is founded on the assumption that
authors will decrease their output when copying prevents them from capturing
revenues for their work.
Electronic reproducing dramatically reduces the costs
associated with the production and dissemination of copies. Caching reproduces
original documents with virtually no information loss, and it actually costs
less (from a social perspective and perhaps from a private perspective) to
cache a document than to fetch the original. These facts have led many to the apparently
logical conclusion that caching must be stopped if authors are to continue to
produce their works in cyberspace.
The apparently logical conclusion (that caching equals easy
copies and must be stopped before it deters authors from producing) fails to
understand the technology and realities of cyberspace. Prior to the computer
age, copies made by photocopy machines and printing presses were possible, but
difficult, expensive, time consuming, imperfect, and physically tangible.
Copyright law predicated itself on the existing labored system of metering out
copies tangibly. Today, computers and the Internet make copying not only
effortless, cheap, quick, and perfect, they also make copying a fundamental
necessity. Caching, along with all basic computer functions, relies on
reproducing information. Computers are devices designed for rapid copying of
information. In order to load its "start-up" program, a computer must
"copy" the program from wherever it was previously stored into RAM
for execution. To view previously stored documents in one's wordprocessor, the
computer must copy them into RAM so that they can appear on the screen. Reading
one's electronic mail involves many acts of copying both the program and the
letters' content. Browsing websites and viewing them on one's screen requires a
computer to copy data into RAM. The Internet simply could not exist without
computers engaging in constant copying.
In light of digital era's reliance on copying, some view
cyberspace as "a kind of 'boundary' condition where certain fundamental
assumptions of the model break down." These people feel that the
historical assumption -- that lower copy costs equals lower incentive for
authors -- no longer holds completely true. While caching may result in less
production by some authors, the new age of the information highway gives birth
to a whole new breed of authors and an entirely different system of reaping
profits from one's intellectual property. To caching proponents, caching is a
powerful new tool for the production and distribution of creative works. They
suggest that disallowing caching will hinder the Internet and in turn stifle
the potential of the new breed of authors on the Internet.
C. POLICY CONSIDERATIONS FOR CACHING AND COPYRIGHT LAW
POST-INFRINGEMENT: AN ECONOMIC ANALYSIS
The Coase Theorem suggests that the choice between a
copyright regime which permits caching and a regime which prohibits caching is
allocatively neutral (i.e. it will produce an identical allocation of caching
"rights" and an identical incidence of caching behavior) as long as
transactions costs are absent. However, transactions costs are never absent. In
the caching scenario, the transaction costs are the costs of implementing a
smooth system for those parties who wish to participate in caching. The
question is who will bear the transaction costs of defining the parties'
respective obligations and negotiating between willing parties.
Economic efficiency requires placing transaction costs on
the lowest cost avoider. Under the old copyright regime where copies were
difficult and more rare, copyright law declared most copying illegal, thereby
placing the transaction cost (of negotiating a contract to copy) burden on the
person who wished to make copies. But the ubiquity of copying on computers and
the Internet may fundamentally change this analysis. Online technology demands
constant copying and most parties participate in and accept constant copying.
To the extent that many websites appreciate caching (especially when they can
control stale information with expiration headers), it is reasonable to suggest
that the likelihood of acceptance of the "I propose to copy this file --
may I do so?" bargain will be high. In such an environment, a copyright
law which declares most copying illegal and forces parties who wish to copy to
bear the transaction costs no longer remains the most efficient set up. Thus,
the transaction cost balance should shift.
A comparison of the efficiency of these alternatives depends
on the distribution of contract acceptances and rejections. In a context where
most copyright holders reject offers, (i.e. "May I copy?" answer,
"No."), the common law rule is sensible and efficient. In such a context
the majority of transactions will consist of a single message: the copyright
holder can simply ignore any offer. However, in a context like the Internet,
where a copyright holder is likely to accept offers, the balance reverses. On
the Internet, a silence-acceptance rule minimizes transaction costs.
CONCLUSION
Caching, by definition, is a form of copying. According to
U.S. copyright law, caching servers, when they cache copyrighted material,
commit copyright infringement. In some situations, the cacher guilty of
copyright infringement may successfully assert the affirmative defense of fair
use. In other situations, like the USOL/HowTired situation, a court may find
that caching does not constitute a fair use. The doctrine of fair use is highly
fact driven. It allows courts a great deal of room for subjectivity and
engenders unpredictability in the law. Unpredictable laws, laws which do not
clearly distinguish legal behavior from illegal behavior, tend to have an
overbroad chilling effect. Many cachers, upon hearing of their uncertain fate,
will likely chose to cease caching rather than risk facing a lawsuit and
potential liability.
Millions of bits of data traverse the Internet every day.
Much of this data is cached. The ultimate aim of the Copyright Act is not to
reward the labor of authors, but to promote the progress of science and the
useful arts. Copyright law needs to determine whether caching impedes or
enhances this goal. Three factors cut in favor of copyright law determining
that caching, at least to some extent, serves the goal of promoting art and
science. First, the fact that technology can tame the drawbacks of caching
while retaining most of the benefits tips the scales in favor of allowing
caching. Second, history has changed basic assumptions of copyright. Before,
easy copies threatened authors' ability to capture revenue. Today, authors on
the Internet are devising entirely different methods of capturing revenue from
their works. The supply of creative works on the Internet does not positively
correlate with strong protection against caching. Third, economics indicates
that a free caching default rule would efficiently lower the transactions costs
of negotiating for copies in a medium dominated by copy-making.
While I suggest that copyright law strive to update itself and accept caching, I suspect that in reality, the law will not matter on this issue. Change occurs far faster on the Internet than in our legislative meetings or courtrooms. The parties most affected by caching, the website content providers and the owners of proxy cache servers have ample incentives to sort this issue out amongst themselves. The threat of a copyright infringement suit will bring cachers to the bargaining table. The reality that caching benefits the Internet and should continue will bring content websites to the bargaining table. The two sides will work out a technological solution (along the lines of the expiration headers) long before copyright law catches up.